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ABSTRACT 

We use the G ALIOS hybrid model of galaxy formation ijHatton et alJ 120(1^ to ex- 
plore the nature of galaxy clustering in the local universe. We bring the theoretical 
predictions o f our model into the observational plane using the MOMAF software 
l)Blaizot et al . 2005) to build mock catalogues which mimic SDSS observations. We 
measure low and high order angular clustering statistic from these mock catalogues, 
after selecting galaxies the same way as for observations, and compare them directly to 
estimates from SDSS data. Note that we also present the first measurements of high- 
order statistics on the SDSS DRl. We find that our model is in general good agreement 
with observations in the scale/luminosity range where we can trust the predictions. 
This range is found to be limited (i) by the size of the dark matter simulation used 
- which introduces finite volume effects at large scales - and by the mass resolution 
of this simulation - which introduces incompleteness at apparent magnitudes fainter 
than r ~ 20. 

We then focus on the small scale clustering properties of galaxies and investigate 
the behaviour of three different prescriptions for positioning galaxies within haloes of 
dark matter. We show that galaxies are poor tracers both of DM particles or DM sub- 
structures, within groups and clusters. Instead, SDSS data tells us that the distribution 
of galaxies lies somewhat in between these two populations. This confirms the general 
theoretical expectation from numerical simulations and semi-analytic modelling. 



1 INTRODUCTION 



Understanding galaxy biasing has become one of the most 
exciting challenges of galaxy formation theories, especially 
due to the overwhelming data sets that are being acquired 
at man y wavelength, e.g . with the Sloan Digital Sky Survey 
fSDSS. IVork et al]l2000t) . and other large scale or deep sur- 
veys. Comprehension of galaxy biasing can help us in using 
large scale structure (LSS) surveys to constrain cosmolog- 
ical parameters. Or, the other way around, assuming that 
the cosmology is known, galaxy clustering sets fundamental 
constraints on models of galaxy formation. It is the second 
line that this paper follows. 

Two fundamentally different approaches are being used 
to investigate galaxy clustering from a theoretical view- 
point. The first one consists in running cosmological sim- 
ulations that describe both the dark matter and the bary- 
onic components of the Universe fe.g. [Pearce et al, .1999; 
Cen fc OstrikeJ I2OO0I : iPearce et alt 1200 it lYoshikawa et afl 



200ll:IWeinberg et al.l2004^ . This method, although describ- 
ing in the most realistic manner the processes of galaxy for- 
mation in the cosmological context, suffers from its compu- 



tational expenses. As a result, large scale clustering can only 
be explored at the price of small scales, or, in other words, 
one has to chose between volume and mass resolution. It 
however remains the only way to describe DM and baryons 
in a fully consistent manner (up to the resolution limits). 
The second approach gathers a large variety of implemen- 
tat ions of the so-call e d hal o model. As is well illustrated 
by iPeacock fc Smithl ll200Cfl . the philosophy here is that 
galaxy clustering stems from three ingredients only, namely, 
(i) halo clustering properties, (ii) halo occupation distribu- 
tion, and (iii) spatial distribution of galaxies within haloes. 
While modelling the spatial distribution of haloes has be- 
come routine with the increasing number of A''-body simula- 
tions, the way to populate these haloes with galaxies is still 
a matter of debate. One can basically distinguish two routes 
among the methods for populating DM haloes with galaxies. 
The first one is based on biasing schemes : given the halo 
mass, one uses phenomenological bias prescriptions to assign 
a number of galaxies of given type and luminosity to a halo. 
Examples of this so-c alled "halo occup a tion distribution" 
(HOD ) method are e.g.ljing et al.l Jl99d).|Peacock fc Smithl 
ll2000D . ISomerville et alJ ll200j) . IScoccimarro et alj i20m . 
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Scoccimarro fc ShethI feOOSl) . iBerlind fc Weinberd J2002l) . 
Yang et al The second route uses semi-analytic 



models (SAMs) of galaxy formation to produce a physically 
motivated distribution of galaxies within haloes. The SAM 
can either be fed with semi-analytic halo merger trees (e.g. 
iKauffmann et al.lll997l: iBenson et al."2nnn'. '2nnif) or merger 
tree s directly extracted from cosm ological DM simulations 
fe.g.lKauffmann et alll 999a':'He llv et alJ20 03: Hatt on et all 
|2003)- In all cases, the spatial distribution of haloes is taken 
from N-body simulations. 

The last unknown in the framework of the halo-model 
is then the spatial distribution of galaxies within the haloes 
they populate. Much work has recently been done to under- 
stand the nature of this distribution relative to the distri- 
bution of DM and it now seems clear that galaxies sample 
sub-halocs within clusters in a non-trivially biased manner 
(e.g. SurinEcl ct al. 2001; Gao et al. 2004; Nagai & Kravtsqy 
|2005j)- The bias arises because sub-haloes are stripped much 
more efficiently than the galaxies they harbour as they 
orbit within the main halo's potential well, which gives 
rise to a steeply decreasing mass-to-light ratio inwards the 
halo. The HOD and SAM routes then again differ. On 
the one hand, the HOD formalism distributes galaxies as 
a function of an instantaneous view of the DM distri- 
bution. It is thus not suited to describe the above evo- 
lutionary process and HOD implementations usually as- 
sume that galaxies are distributed as the DM particles 
within each halo, with the exception of the most mas- 
sive galaxy which is forced to lying a t the centre of its 
host halo. Note that iGao et all (12004^ suggest that this 
is a v ery good approximation, although iNagai fc Kravtsovl 
l|2005(l find somewhat different results. On the other hand, 
SAMs generally attempt to predict the galaxy spatial dis- 
tribution with a more or less detailed modelling of the dy- 
namical processes that shape it. This either involves semi- 
analytic prescriptions that describe dynamic al friction and 
how halo mergers affect galaxy orbits (e.g. iHatton et alJ 
l2003l) . or DM-based treatments in which galaxies typically 
follow the most bound particle of the halo in which t hey 
were f ormed (e.g. ..K auffmann ct al. 1999a b; Diaforio et alJ 
19991. I2OOII: iMathis et alJ 120021: iMathis fc Whit j I2OO2I: 
Hellv et al.ll2003l) ar even DM sub-haloes^ ~ ilSpringelet^d] 



Meuy et ai.ll2UUdll or even UM sub-naioes' (| 
200ll:lDe Lucia et alJl200l : iGao et al.ll20oi) . 



The objective of this paper is to improve on previous 
theoretical studies of galaxy clustering in the following di- 
rections. First, we u se the state-of-the- art galics model 
of galaxy formation llHatton et alJl2'003^ to populate DM 
haloes from a cosmological A'^-body simulation. This model 
describes galaxy formation with semi-analytic prescriptions 
applied within halo merger trees extracted from that DM N- 
body simulation, and thus provides us with a physically mo- 
tivated halo occupati on distribution (HO D). Second, we use 
the MOMAF software (Blai zot et al.ll2005l hereafter momaf) 
to construct mock catalogues that mimic the SDSS early 
data release and DRl, both in geometry and photometric 



^ Because of finite mass resolution and efficient tidal stripping 
of sub-structures, the sub-haloes harbouring galaxies cannot all 
be followed in practice. A proxy is then necessary : when a sub- 
halo disappears, the galaxy it contained follows the most bound 
particle of this sub-halo, identified before the halo vanishes. 



selection. These mock catalogues enable us to carry out a 
direct comparison of angular gaX&Kj clustering statistics with 
those observed in the SDSS. Note that comparisons of hy- 
brid models and observations have alre ady been performed 
in the "observation al plane" in the past jDiaferio et al .119981 : 
IMathis et al .120021) . Third, we extend the comparison to high 
order statistics such as the 3- and 4-point angular correla- 
tion functions. Fourth, we investigate how clustering statis- 
tics are affected by the spatial distribution of galaxies within 
haloes, that is, what does the SDSS data tell us about the 
small scale distribution of galaxies. To this end, we com- 
pare the results obtained with three different galaxy distri- 
butions : (i) the one predicted by the "progenitor position 
interpolation" scheme implemented in the standard version 
of GALICS, (ii) one where galaxies follow dark matter density 
within haloes, and (iii) one where galaxies are distributed as 
DM substructures. Finally, as part of the GALICS series, a 
side-output of this paper is the validation of the combined 
tools GALICS and momaf concerning their ability to predict 
spatial and angular clustering of galaxies in a more general 
context, i.e. in the framework of forthcoming extra-galactic 
surveys. This is particularly meaningful since all the data 
used in this paper are available from th e GALICS web-page' ^, 
in the form of a relational database (see lBlaizot et al.l2005^ . 

The paper is organised as follows. In Sec.|5|we review 
the characteristics of GALICS and MOMAF which are relevant 
to the present study. In Sec.|21we discuss the two-point an- 
gular correlation. In Sec.^Jwe discuss higher order clustering 
statistics. We discuss our results and conclude in Sec.|^ 



2 SIMULATION AND MOCK CATALOGUES 

GALICS is a hybrid model of galaxy formation which com- 
bines cosmological DM simulations with a semi-analytic de- 
sc ription of baryonic p rocesses. The model is fully described 
in iHatton et alJ i2003l) . and the version we use here is the 
sam e as that used in the previous papers of the galics se- 
ries ("Hatt on et all2003l : lDevriendt et al.l2005l : lBlaizot et alJ 
^004). We briefly remind the main ingredients in Sees. 12.11 
and 12.21 We have been lead to change ou r prescription 
for po sitioning galaxies within haloes, since IHatton et alJ 
i2003|). We explain our new prescription in Sec. 12.31 (also 
read lLanzoni et 31112005(1 . In this latter section, we present 
alternative positioning schemes that we will explore in the 
following sections. 

Eventually, galics' outputs are turned into mock cat- 
alogues using MOMAF, as explained in Sec. 12.41 We check in 
this latter section that the basic properties of these mock 
catalogues, i.e. number counts and red-shift distributions, 
are in agreement with SDSS data. 



2.1 Dark matter 



The cosmological N-body simulation jNininlll99g l) we use 

throughout this paper assumes a flat Cold Dark Mat- 
ter cosmology with a cosmological constant {Q.m = 1/3, 
Q,A ~ 2/3), and a Hubble parameter h = _ffo/[100 km s~^ 
Mpc"^] = 0.667. The initial power spectrum was taken 



' |http : / /galics ■ cosmologies ■ f r/ | 
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to be a scal e-free (n^ = 1 ) pow er spectrum evolved as 
predicted by iBardeen et al.l lll986ll and normalised to the 
present-day abu ndance of rich clusters with as — 0.88 
llEke et al.Hl99l . The simulated volume is a cubic box of 
side Lb = 100/i~^Mpc, which contains 256'^ particles, result- 
ing in a particle mass rup = 8.272 x Mq and a smooth- 
ing length of 29.29 kpc. The density field was evolved from 
z = 35.59 to present day, and we out-putted about 100 snap- 
shots spaced logarithmically with the expansion factor. 

In each snapshot, w e identify halos u sing a friend-of- 
friend (FOF) algorithm llDavis et alJll985l) with a linking 
length parameter b = 0.2, only keeping groups with more 
than 20 particles. At this point, we define the mass Mfof 
of the group as the sum of the masses of the linked particles, 
and the radius Rfof as the maximum distance of a con- 
stituent particle to the centre of mass of the group. We then 
fit a tri-axial ellipsoid to each halo, and check that the virial 
theorem is satisfied within this ellipsoid. If not, we decre- 
ment its volume until we reach an inner virialised region. 
From the volume of this largest ellipsoidal virialised region, 
we define the virial radius R^ir and mass M^ir ■ These virial 
quantities are the ones we will use later to compute the cool- 
ing of the hot baryonic component. Once all the halos are 
identified and characterised, we build their merger history 
trees following all the constituent particles from snapshot to 
snapshot. 

2.2 Lighting up haloes 

The fate of baryons within the halo merger trees found above 
is decided according to a series of prescriptions which are 
either theoretically or phenomenologically motivated. The 
guideline - which is similar to other SAMs - is the following. 
Gas is shock-heated to the virial temperature when captured 
in a halo's potential well. It can then radiatively cool onto a 
rotationally supported disc, at the centre of the halo. Cold 
gas is turned into stars at a rate which depends on the dy- 
namical properties of the disc. Stars then evolve, releasing 
both metals and energy into the interstellar medium (ISM) , 
and in some cases blowing part of the ISM away back into 
the halo's hot phase. When haloes merge, the galaxies they 
harbour are gathered into the same potential well, and they 
may in turn merge together, either due to fortuitous col- 
lisions or to dynamical friction. When two galaxies merge, 
a "new" galaxy is formed, the morphological and dynami- 
cal properties of which depend on those of its progenitors. 
Typically, a merger between equal mass galaxies will give 
birth to an ellipsoidal galaxy, whereas a merger of a massive 
galaxy with a small galaxy will mainly contribute to devel- 
oping the massive galaxy's bulge component. The Hubble 
sequence then naturally appears as the result of the inter- 
play between cooling - which develops discs - and merging 
and disc gravitational instabilities - which develop bulges. 

Keeping track of the stellar content of each galaxy, as a 
function of age and metalicity, and knowing the galaxy's 
gas content and chemical composition, one can compute 
the (possibly extincted) spectral energy distribution (SED) 
of each galaxy. To thi s end, we use the STARDUST model 
JPevriendt et alJll999ll which predicts the SED of an ob- 
scured stellar population from the UV to the sub-mm. 

The above modelling of galaxy formation provides us 
with a physically motivated HOD : it tells us how many 



galaxies one expects in each halo along with the properties 
of these galaxies. It also predicts the dispersion (and higher 
orders) of the HOD, as a result of each halo's individual for- 
mation history. In the GALICS model, the number of galaxies 
that populate a halo results from basically three ingredients. 
First, gas cools in haloes massive enough compared the the 
IGM temperature. This is the source term and produces one 
(central) galaxy per (massive) halo. Second, galaxies gather 
in the same structures when haloes merge. This is the only 
way to get more than one galaxy per halo, and tends to yield 
a number of satellite galaxies proportional to halo mass at 
high masses. Third, galaxy-galaxy mergers are the only sink 
term (regardless selection effects). In a paper in prepara- 
tion, we show that the HOD predicted by GALICS is in good 
agreement with results from a smoothed-particle hydrody- 
namics cosmological simulation, suggesting that the three 
above ingredients and their implementation properly cap- 
ture the physics that shape the HOD. 

2.3 Galaxy positions 

The position pg of a galaxy in the simulation volume can 
be written as Pg = Ph + 5p, where ph is the position of 
the centre of mass of the host halo, and Sp the position of 
the galaxy within this halo. While the positions of haloes are 
well known from the DM simulation, the spatial distribution 
of galaxies inside their host haloes is not described by DM- 
only simulations. One thus needs a model to predict each 
galaxy's Sp. In this paper, we investigate the effect of three 
such models on the clustering properties of galaxies : (i) 
the "progenitor position interpolation" (PPI) implemented 
in the standard version of GALICS, (ii) a scheme in which 
galaxies follow DM within haloes (FOF), and (iii) a model in 
which galaxies are positioned on DM substructures (SUB). 
Seeing which scheme the SDSS data prefer will hopefully 
help us understand how galaxies are distributed within DM 
haloes. 

PPI - Because of the spherical symmetry assumption made 
in GALICS, a galaxy's position in our model is described only 
by its orbital radius. We model two processes that can af- 
fect a galaxy's distance to its halo's centre : (i) dynamical 
friction brings galaxies to the centre, and (ii) halo mergers 
heavily perturbate galaxies' orbits. Because of the frequent 
mergers, it is the latter process that mostly determines the 
galaxy distribution. When two haloes merge, the positions of 
the galaxies within the descendent halo are obtained with an 
interpolation of the progenitor's positions using their veloc- 
ities. In this paper, we use a new prescription to reposition 
galaxies after hal o mergers, wh i ch is a modified version of 
that described in iHatton et alJ i2003h . designed to better 
take into account the difference in masses of the merging 
haloes (see Lanzoni ct al. 2005). In pr actice, the displace- 
ment distance Rj of lHatton et all 1I2OO3I . eq. 5.1) is now mul- 
tiplied by a factor (1 — Mprog/Mson), where Mprog is the mass 
of either progenitor, and Mson the mass of the descendent. In 
this way, when a small halo merges with a much more mas- 
sive one, galaxies' orbits in the massive halo will change very 
little, whereas galaxies' orbits in the small halo will change 
as before. The effect of this prescription is to yield more con- 
centrated galaxy distributions than the original prescription 
did. Note however that the overall properties of galaxies are 
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alm ost identical to thos e presented in iHatton et al.l (12003 
and lBlaizot et all i2004l) . An example of the galaxy distribu- 
tion predicted by the "PPI" model is shown in the upper-left 
panel of Fig.0 for a halo of mass 8.3 x 10^* M©, containing 
397 galaxies. 

FOF - The second positioning scheme explored in this pa- 
per, and hereafter called "FOF", consists in placing galax- 
ies on random particles of the halo they belong to, with 
the exception of the most massive galaxy which is forced 
to lie at the centre of mass of its halo. This is done as 
a post-treatment of the GALICS outputs and has thus no 
effect on the physical properties of modelled galaxies. For 
the same reason, though, the positions of satellite galax- 
ies within haloes are not related to their physical proper- 
ties. The resulting distribution is illustrated in the lower-left 
panel of Fig.0 Several qualitative differences can be noticed 
with the PPI distribution : (i) the shape of the distribution 
is more complex (two cores, etc.), (ii) it is much more con- 
centrated near the core(s). Note that this FOF distribution 
is the one most commonly used in HOD implementations. 

SUB - The third positioning prescription we explore con- 
sists in placing galaxies on top of DM sub-structures (this 
will hereafter be referred to as "SUB"). In this case, we as- 
sign galaxies to sub-structures as a function of their masses : 
more massive galaxies go to more massive sub-structures. As 
a result, the most massive galaxy of a halo naturally ends 
up at the centre of mass of this halo. This procedure relies 
on the assumption that the mass of a sub-structure roughly 
scales with that of the galaxy it contains. This assumption is 
definitely questionable, since sub-structures are much more 
efficiently tidally stripped - while orbiting within the main 
halo - than the galaxies they harbour (e.g. ISp rinacl ct al, 
I2OOII: iDiemand et"al] 120041: iNaeai fc Kravtsovl 120051) . We 
thus expect our procedure to induce a significant depletion 
of identified galaxies in the cores of massive haloes. This 
should definitely have some impact on the measurement of 
the two-point correlation function and higher order statis- 
tics at small scales. In particular, we expect that the SUB 
scheme will lead to an under-estimate of the small-scale clus- 
tering signal, with respect to wha t would be found with a fu ll 
hydro-dynamical treatment as in lNagai fc Kravtsovl i2005f) . 
Still, the exercise is interesting because our SUB and FOF 
schemes are expected to closely bracket the "true" distribu- 
tion of galaxies. 

Also note that in practice, the number of substructures 
within a halo may differ from the number of SAM galaxies it 
contains. This is an issue when there are less sub-structures 
than galaxies. In this case, the extra (low-mass) galaxies 
are given the positions of random dark matter particles as 
in the FOF scheme. Fortunately, for our SDSS mock cata- 
logues, such miss-identifications are rare enough, as shown 
in Appendix 1X1 The upper-right panel of Fig. shows the 
"SUB" distribution of galaxies in the same halo as before. 
This distribution lies somewhat in between the FOF and 
PPI pictures. The identi fication of sub s tructu res is done us- 
ing the AdapatHOP code lAubert et all i2004) . as described 
in Appendix IXI 

Finally, note that in the 3 different positioning schemes, 
we do not allow for galaxies to overlap, i.e. we impose a 
minimum distance between galaxies of twice their sizes. 
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Figure 1. Projected positions of galaxies (dots) in a 8.3 X lO'^* 
MQ-lialo containing 397 galaxies. The upper-left (resp. upper- 
right, lower-left) panel shows the PPI (resp. SUB, FOF) galaxy 
distribution. The lower-right panel shows the distribution of dark- 
matter particles, for comparison. The virial radius of this halo is 
1.6/i-iMpc. 



2.4 Mock catalogues 

We use the random tiling technique described in momaf to 
build mock observations from the redshift outputs of GAL- 
ICS. We mimic the SDSS early data release by constructing 
catalogues of 2.5 x 90 square degrees, limited in apparent 
magnitude at r = 22. As explained in momaf, several dif- 
ferent observing cones can be generated from the same set 
of outputs of GALICS, by changing either the line of sight 
or the seed for the random tiling. We build 20 cones with 
seeds and lines-of-sight chosen randomly for each positioning 
scheme. These 20 cones allow us to infer some estimate of 
the dispersion in clustering measurements, that is, the cos- 
mic variance associated to our mock catalogues. However, 
given the rather small size of the simulation box, 100/i~^ 
Mpc on a side, this estimate is likely to be biased and has 
to be taken as a lower boundary on the cosmic errors. 

In Table we give some geometrical characteristics of 
our mock catalogues. The first line gives the median red- 
shift (^med) of each apparent-magnitude selection. The sec- 
ond and third lines give the angular size (6';,) of our simu- 
lated volume at Zmed and the corresponding number of boxes 
required to fill the observing cone in its largest dimension 
{Nt X Ot ^ 90deg). Both these quantities give an idea of the 
importance of finite volume and replication effects, which 
tend to reduce the amplitude of the A''-point correlations 
functions as well as that of the me asured cosmic variance on 
their estimates fsee lBlaizot et aP (2005) for a thorough dis- 
cussion of these effects). The fourth and fifth lines give simi- 
lar quantities, this time along the line-of-sight. The sixth and 
seventh lines give the completeness limits at Zmed in terms of 
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Figure 2. Comparison of number counts from GALICS (solid liis- 
toeram) witli those from tlie SDSS early data release (crosses with 
error bars, taken from lVasuda et alJ & . in the r filter. 

absolute rest-frame magnitudes in the r-band (rres) and in 
the 7-band (/res)- Fainter than these limits, our sample of 
galaxies is incomplete due to resolution effects : we miss 
part of the galaxies because they would lie in unresolved 
DM haloes. The last line of Tabled gives the observer-frame 
absolute magnitude corresponding to the faint boundary of 
the selection at Zjied in each magnitude bin. This magnitude 
should be compared to rres at low red-shifts and to /res at 
higher red-shifts. Comparison then tells us whether the sam- 
ple of galaxies we select with the apparent-magnitude cut is 
complete. As can be seen from the two last columns of Ta- 
bled our samples of galaxies become incomplete faint-wards 
r ~ 20. 

Before approaching clustering statistics, one should first 
check that one-point statistics (number counts and red-shift 
distributions) are in agreement with the data. In Fig. |5| 
we show the comparison of GALICS counts in the r band 
(solid line) with t he SDSS obse r vation s (crosses with error 
bars) taken from IVasuda et akl l|200lll . These counts were 
measured on one mock SDSS stripe. Our model slightly over- 
estimates the observed counts at all magnitudes, of ~0.1 dex 
in number or ~ 0.2 mag in magnitudes. The reasons for this 
over-estimate are not obvious. They are partly due to an 
over-estimate of the present-day luminosity function, and 
possibly to a slightly wrong redshift evolution (although see 
discussion of 2dF N{z) in momaf). The important point 
here is that the counts match observations well enough for 
our purposes. We indeed show in Sec. I3.1l that such a small 
error in the number counts does not affect our conclusions 
concerning clustering. 

In Fig.l^we show the red-shift distributions of modelled 
galaxies selected in four apparent-magnitude bins (hereafter 
"standard" magnitude bins). The solid line shows the red- 
shift distribution of galaxies with apparent r magnitude 
between 18 and 19, the dashed line is for 19 < r < 20, 
the dot-dashed line for 20 < r < 21, and the dotted line 
for 21 < r < 22. The median red-shifts of each sample 
are respectively z^^a = 0.22, 0.31, 0.41, 0.55, as indicated 
with the vertical lines in Fig. |21 The median red-shift of 
the brightest bin is consistent with Zmed = 0.18 given by 



Figure 3. Red-shift distributions of modelled galaxies selected in 
four apparent-magnitude bins. The solid line (resp. dashed, dot- 
dashed, dotted) corresponds to 18 < r < 19 (resp. 19 < r < 20, 
20 < r < 21, 21 < r < 22). 

lOonnollv et al.l (l2002l) . Again, these redshift distributions 
were obtained from a single mock catalogue. The (small) 
high-z bump seems to be a general trend of our model (it 
appears in most of our 20 mocks), and not due to a par- 
ticular super-structure. We have checked that this anomaly 
has no effect on our clustering estimates by computing 10(6) 
with and without galaxies in the high-z tails : results are 
undistinguishable. 



3 TWO-POINT CORRELATION FUNCTION 

In this section, we first show that our clustering results do 
not depend much on the uncertainty in the counts. Then, we 
present the angular correlation function (AOF) we obtain 
with the PPI scheme for positioning galaxies, and discuss 
it's agreement with SDSS data. Finally, we explore how the 
three positioning schemes affect the AOF at small scales. 

3.1 ACF estimate and robustness 

We compute the AOF w(6) using the estimator proposed by 
iLandv fc Szalavl il993ft . However, instead of counting pairs, 
we use a fast Fourier transform (FFT) scheme which is much 
faster when the number of galaxies is large ( Szaoudi ct aH 
I2OOII) . This method requires one to project the apparent 
galaxy density onto a grid, the cell-size of which sets a lower 
limit to the scales one can probe. We therefore project each 
mock SDSS strip on rectangular grids of 16384 x 455 cells, 
which correspond to cells of size ~ 20 arc-seconds. 

We do not attempt t o correct for the integral constraint, 
as lScranton et all (l2002l) showed that it is negligible. More- 
over, because we mimic the geometry of the SDSS, we are 
affected by the same integral constraint as observations. A 
direct comparison of both raw estimates then makes more 
sense. Also, we do not estimate errors analytically, as they 
are in principle fully contained in the dispersion of our mea- 
surements among the 20 mock catalogues. Remember how- 
ever that we expect this dispersion to give a lower bound on 
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Median red-shift of galaxies in each magnitude bin, as shown by the vertical lines in Fig. |2l 
^ Angular size of the simulated volume at Zmed, in degrees. 

Number of boxes tiled across the line of sight, in the direction where the observing cone is 90 deg. wide. 
Co -moving distance from the observer to 2 x ^med; 

in /i~^Mpc. 

^ Number of boxes tiled along the line of sight. 

f Completeness magnitude limit in the SDSS r band (rest-frame). 

9 Completeness magnitude limit in the i^814W band from HST (rest-frame). 

^ Absolute (observer-frame) magnitudes corresponding to the fainter boundary of each apparent magnitude bin, at the 
corresponding median red-shift. 

Table 1. Summary of the limitations of our simulation, in terms of volume (first five rows), and mass resolution (three last rows). 
The angular size of our simulation allows us to probe clustering up to scales ranging from 1 to 0.6 degrees from the brightest to 
the faintest apparent magnitude bins. Moreover, the mass resolution guarantees that our samples of galaxies are complete in the two 
brightest magnitude bins, while we certainly miss part of the galaxies at fainter fiuxes. 



the errors rather than a true estimate, due to the finite size 
of the simulation box. 

As mentioned earlier, the counts from GA LICS slightly 
differ from those given bv lYasuda et al.l i200 J) . The signifi- 
cance of this discrepancy is not very clear and could possibly 
be due to e.g. defini t ion of magnitudes, or pollution by stars 
jYasuda et alj|200ll : IScranton et alJl2002l) . A fuU treatment 
of photometric errors is however beyond the scope of this 
paper. Instead, we show that our results are not very sensi- 
tive to apparent magnitude uncertainties. To do this, we use 
a single mock catalogue to compute the angular correlation 
function for galaxies in the four standard magnitude bins 
(18 < r < 19, 19 < r < 20, 20 < r < 21, 21 < r < 22) and 
in magnitude bins shifted by —0.2 mag (17.8 < r < 18.8, 
18.8 < r < 19.8, 19.8 < r < 20.8, 20.8 <r < 21.8). This 
shift is about what is required for our model counts to fit 
the SDSS counts. We show the results in Fig. ^ for one of 
the PPI catalogues. The solid lines correspond to the stan- 
dard magnitude bins, and the dashed ones to the shifted 
magnitudes. Naturally, we find that brighter galaxies (the 
shifted bins) are more clustered than fainter ones. However, 
the shape of the correlation function is not affected, and the 
difference in amplitude is very small. Similar results would 
be obtained with the FOF and SUB schemes to position 
galaxies. Our conclusions are thus robust in this prospect. 



3.2 Results for the PPI scheme 

In Fig. 1^ we show the mean ACF (solid line) and its dis- 
persion (dark grey region) from 20 mock catalogues built 
using the PPI positioning scheme. The hght grey areas show 
the envelopes of the me asurements. The diamonds with er- 
ror bars are taken from IConnoUv et alJ i2002l) . Now, some 
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Figure 4. Angular correlation functions of galaxies in different 
apparent-magnitude bins using the PPI scheme to locate galaxies. 
The solid lines show the ACF of galaxies with 18 < r < 19, 
19 < r < 20, 20 < r < 21 and 21 < r < 22, from top to bottom. 
The dashed lines show the ACFs of galaxies with 17.8 < r < 18.8, 
18.8 < r < 19.8, 19.8 < r < 20.8 and 20.8 < r < 21.8, from top 
to bottom. An uncertainty of 0.2 mag translates in very little 
changes of w{d). 

explanations might be useful to understand how good the 
match actually is between our model and the observations. 

At large scales, typically larger than ~ 6*6/10, where 
9b is the angular size of our simulated volume at the me- 
dian redshift of a considered magnitude bin, finite volume 
effects affect our estimates of the ACF (see Tabled for nu- 
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merical values of 61 ,). This effect was discussed at length in 
iBlaizot et alJ (120051) and is responsible for the (small) under- 
estimate of the ACF at large scales. Interestingly, the finite 
volume limit of our simulation neighbours that of the obser- 
vations, since the observed stripe is 2.5 degrees large. The 
disagreement of our model with data at large scales is thus 
well understood in terms of finite volume effects, hence it 
does not point to any failure in the GALICS model. 

At faint magnitudes, especially in the two faintest ap- 
parent magnitude bins, incompleteness - due to mass res- 
olution - settles in progressively, and is responsible for the 
increasing amplitude over-estimate faint- wards. The three 
bottom lines of Tabled give us insight on this bias. As dis- 
cussed in Sec. 12.41 faint- wards r ~ 20, one finds M,. > /res, 
which means that the samples of galaxies selected in the 
two faint magnitude bins are incomplete. This incomplete- 
ness is such that the selected galaxies inhabit a population of 
haloes biased towards high masses. Now, as massive haloes 
cluster more than low-mass ones, increasing incompleteness 
implies an increasing positive bias in the ACF. This is what 
happens at r > 20. However, our results in the apparent- 
magnitude range [18; 20] are robust, and the amplitude of 
the ACF found for these samples is in good agreement with 
observations. 

At small scales, typically smaller than the angular size 
of a group (i.e. ~ 6^(1 Mpc] at Zmed) our predicted ACF under- 
estimates the observed one. We will show in the next sub- 
section that this bias can be attributed to an over-diluted 
distribution of galaxies within haloes of DM. 

The discussion above shows that our results are in good 
agreement with observations, in the rather restricted domain 
where our model and catalogues are valid. We can neverthe- 
less certainly improve the situation, as we discuss in the 
following subsection. 

3.3 Exploring the small scale galaxy distribution 

The so-called "halo occupation distribution" (HOD) for- 
malism has proven to be quite helpful in terms of under- 
standing the origin of the clustering properties of galaxies. 
In the HOD framework, galaxy clustering is the result of 
three ingredients : (i) the spatial distribution of halos, (ii) 
the number of galaxies per halo, and (iii) the distribution 
of galaxies within haloes. In the present study, the distri- 
bution of haloes is drawn from a cosmological DM sim- 
ulation. Except for the very little difference between the 
"concordance model" and the cosmological parameters we 
assume, point (i) is thus certainly the least questionable 
part of this work. The number of galaxies that each halo 
harbours is a more difficult issue, as it is the result of 
our complex semi-analytic post-processing. Moreover, this 
quantity is ve ry difficult to constrain observationally (see 
however Ivan de n Bosch ct al. 20o3)- In a- paper in prepara- 
tion, we compare the HOD obtained with GALICS to that 
obtained with a cosmological smoothed-particle hydrody- 
namics (SPH) simulation, and find very good agreement. 
This, combined with the numerous statistics that have been 
checked for our model jHatton ct al. .2003; Blaizot ot .SI 
l2004l:lLanzoni et alj20'05l) gives us confidence in the fact that 
we predict the right number of galaxies per halo. Then re- 
mains point (iii) only to explain the small-scale discrepancy 
shown in the previous section between our model and obser- 



vations. One of the interesting results of the HOD formalism 
is to decompose the correlation function into two terms. A 
term due to pairs of galaxies located in different haloes (the 
2-halo term) dominates at large separations, and a term due 
to pairs of galaxies populating the same halo (the 1-halo 
ter m) dominates the clusterin g signal at small scales (see 
e.g. iBerhnd fc Weinberdl20o3) . The 1-halo term is mainly 
due to galaxies that lie in groups or clusters, and is sensitive 
to the way galaxies are spatially distributed within these 
massive haloes. Already from Fig. one can have a feeling 
of what is happening : the distribution of galaxies predicted 
by the PPI scheme within groups and clusters is less concen- 
trated than that obtained with the two other schemes (FOF 
and SUB). This will naturally lead to an under-estimate of 
the 1-halo term, and so to an under-estimate of the ACF at 
small scales. 

In the top panels of Fig. |3 we compare the ACFs ob- 
tained with the three positioning schemes proposed in Sec. 
12.31 The solid (resp. dashed, dot-dashed) lines show the 
mean PPI- (resp. FOF-, SUB-) ACF fi-om the 20 mock cat- 
alogues described in Sec. 12.41 This comparison tells us many 
things. First, changing the distribution of galaxies within 
haloes does indeed change the behaviour of the ACF at small 
separations, although it leaves unchanged the ACF at large 
scales, as expected. Second, the FOF scheme yields an ACF 
which over-estimates the observed ACF at small scales. If 
not a well known result, this is at leas t a feature which is 
commonly found in the literature (e.g. [Benson et al]|2000l; 
IScoccimarro et alJ 200l':'Berl ind et al.l2003l:IWeinberg et alJ 
2004i: lYang et al.l 120041. Although lYang et alJ ^2004^ in- 
terpreted this feature as a hint that the normalisation of 
the power-spectrum (ag) is over-estimated in the concor- 
dance model, our analysis suggests another explanation 
which simply relies on the distribution of galaxies within 
h aloes. This explanation i s also supported by the work 
of iKauffmann et al ] ^1999a^ who find a spatial correlation 
function in agreement with observations. In their work, the 
positions of galaxies are obtained following the most-bound 
particles of the haloes in which they were formed. This is, 
in essence, similar to following sub-structures, except that 
it allows to follow them below mass resolution, and to by- 
pass the expensive identification of sub-structures. Third, 
the SUB scheme gives a result intermediate between FOF 
and PPI, but still not in agreement with the data : it yields 
a depletion of the two-point correlation function at small 
separations, similar to PPI. This was expected, as discussed 
in Sec. 12.31 due to the fact that sub-structures are tidally 
stripped as they spiral towards the centres of massive haloes. 
As a result, the number of pairs found at small separations 
with our SUB scheme is smaller than what we would ex- 
pect from the real galaxy distribution. This effect is also 
increased by artificial phase-space heating due to A''-body 
relaxation. In reality, even if a sub-halo is tidally stripped, 
its host galaxy still exists. However at variance with pure 
dark matter, galaxies can experience non trivial collisions 
that would expectingly reduce slightly their concentration 
in the centre of rich haloes, which give a likely explanation 
for the fact that FOF overestimates the ACF at small sep- 
arations. 

It is then hard to find a way to populate haloes with 
finite resolution DM sim ulations only. Even f ollowing sub- 
haloes dynamically as in ISpringel et all ll200ll) requires the 
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Figure 5. Angular correlation function measured from our mock catalogues for galaxies selected in four apparent magnitude bins. We 
computed w{d) for 20 independent observing cones of 2.5x90 square degrees. The solid lines show the mean value, the dark grey area 
shows the dispersion among the 20 cones, and the light grey region shows the envelope of measures. In each panel, the diamonds with 
error- bars show the SDSS ACF measured bv .Coniiollv et al.. L2002) . 



use of a proxy when sub-haloes dissolve : galaxies are then 
associated to the locally most bound particle. It is however 
not clear how this proxy behaves once sub-structures are 
disrupted - although long relaxation times suggest that the 
trajectories of once most-bound particles should be a good 
ap proximation. In v i ew of t his effect, the agreement found 
bv lKauffmann et alj ^1999a^ can be understood as the result 
of an average between our FOF and SUB biasing schemes, 
confirming our above statement that the SUB and FOF pre- 
scriptions narrowly bracket the real solution. 



To summarise the results, although the accuracy 
reached in this paper cannot really help us to rigorously dis- 
entangle the SUB and FOF schemes, our measurements con- 
firm well known results of the literature : (i) sub-structures 
are non trivially biased tracers of galaxies, and (ii) galaxies 
are distributed inside haloes very much like dark matter, but 
in a slightly less concentrated way. In the next section, we 
explore how this assertion resists the additional constraints 
from higher-order clustering. 



4 HIGHER-ORDER STATISTICS 

Because gravity has long pulled structures harbouring galax- 
ies away from possible initial Gaussianity, the distribution of 
galaxies is not fully characterised by the two-point correla- 
tion function alone. Instead, higher-order correlations have 
become non-zero and encapsulate the details of the small- 
scale non-linear galaxy distribution. It is thus very impor- 
tant to confront higher-order predictions from our model 
to observational determinations. In this section, we first ex- 
plain the count-in-cells method that we used to measure 
high-order clustering on SDSS DRl and on mock catalogues. 
Then, we briefly discuss the estimate made on SDSS DRl. 
And finally, we compare results obtained with SDSS-DRl 
and our mocks in order to understand whether this new set 
of constraints can help discriminate between our SUB and 
FOF schemes. 

4.1 The counts in cells method 

The probability distribution of counts in cells (CIC), Pjv(S), 
is the probability that an angular cell of (linear) dimension 
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6 contains galaxies. The factorial moments of this dis- 
tribution are defined by Fk = PN{N)k, where {N)k = 
N{N - 1)..{N - k + 1) is the ifc-th falling factorial of N. 
The factorial moments are closely related to the moments of 
the underlying continuum random field (which is assumed 
Poisson-sampled by the g alaxies), p = (N)(l + S), through 
((1 + 5)'=) = Fk/{Nf (Sz apudi fc Szalavlll993h . where an- 
gle brackets in the last relation denote an area average 
over cells of size 9. The factorial moments therefore provide 
a convenient way to estimate the angular connected mo- 
ments, Sp = {5^)c/ {S'^Y'^ , where the subscript c denotes 
the connected contribution, and {&^)c denotes the area av- 
erage (over scale 9) of the p— point angular correlation func- 
tion. The moments Ss (skewness) and ^4 (kurtosis) quantify 
the lowest-order deviations of the angular distribution from 
a Gaussian. 

It is straightforward to calculate factorial moments from 
the distribution o f CIC, and one can th en use the re- 
cursion relation of ISzapudi fc Szalavl il993tl to obtain the 
Sp's. This technique is des cribed in more compl ete detail in 
ISzapudi et all (|1996*) and ' Szapudi et al.l (1200 ll) . The most 
delicate and time consuming component of estimating the 
cumulants S'p's is then the accurate measurement of CIC 
distribution. 

As shown in ISzapudi fc Colombi 1^9^, large-scale 
measurements are dominated by edge effects which are im- 
possible to correct for exactly - even when using massive 
over-sampling. This stems from the fact that, due to finite 
cell size, galaxies near the edge of the survey (or near a 
masked-out region) receive a smaller statistical weight than 
galaxies away from any edge. This has devastating effect 
when estimating CIC in galaxy surveys. Typically, across 
the whole SDSS area, there are over 100 cut-out holes per sq. 
degree. Consequently, a randomly placed cell of side ~ 0.1° 
has a high probability of intersecting a mask. Now, because 
traditional CIC techniques discard such cells, they would 
not be able provide us with measurments on scales larger 
than ~ 0.1 degree. To remedy this situation, we measure 
the CIC distribution using a new estimator by Colombi & 
Szapudi (2006, in prep.) and its implementation BMW-PN 
(for Black-Magic- Weighted-PN). This estimator features a 
linear, massively oversampling algorithm, and sports a new 
approximate edge correction scheme. 

The recipe implemented in BMW-PN gives approxi- 
mately equal weight to each galaxies during CIC estimation. 
While it was shown previously that this is impossible un- 
der the most general circumstances, the approximate scheme 
uses the fact that the CIC distribution is fairly insensitive to 
cell shape (Szapudi 1998). This empirical fact can be used 
for edge efi'ect correction for the special case of estimating 
galaxy CIC in the following way. The data are pixelized on a 
very fine grid, which will give CIC for the smallest possible 
scale, the grid step size. The same operation is performed 
for the masks. On these pixelized data, one considers all 
the possible square cells of all possible sizes, that are seen 
as ensembles of pixels. For each of these cells, an effective 
size is given, corresponding to the valid area it encompasses 
(overlapping pixelized masks are subtracted). Then the cen- 
ter of mass of the valid part of the cell is calculated, and one 
finds the pixel it falls into. With that procedure, a number 
of cells of a given effective scale will fall onto this same pixel. 
However, one is interested only in one cell, because one cell 



per pixel is enough to extract all the available statistical in- 
formation at the chosen pixelization level. One thus selects 
the cell which is the most compact one, or, in other words, 
the initial square cell of smallest possible size before mask 
area substraction. This way, one increases the effective area 
sampled by the cells and the amount of available statistics. 
Moreover, due to the fact that only one cell at most is al- 
lowed to contribute to a pixel, a more even weight is given 
in practice to galaxies near the edge of the catalog, which 
reduces edge effects. It is however not easy to demonstrate 
that analytically: only practical experiments show that it is 
indeed the case (see Colombi & Szapudi 2006). This is why 
the method is caUed "Black-Magic Weighting" (BMW). 

A more detailed explanation of this estimator is given 
by Colombi & Szapudi (2006) who performed a series of 
tests based on simulated galaxy surveys, and masks lifted 
from real galaxy surveys. They have found that the method 
works with high precision. The control parameter, a number 
between and 1, determines the fraction of the cell allowed 
to overlap with a mask. corresponds to no overlap ("clas- 
sical" CIC estimation), while larger numbers turn on the 
BMW. It was found that even at 75% allowed overlap, the 
systematic errors introduced are negligible. For a margin of 
error, we have allowed 50% overlap in all the calculations 
presented below. 



4.2 The SDSS-DRl data set 

The first major SDSS data release jAbazaiian et aT ] |200l 
DRl) covers 2099 square degrees and contains over 53 mil- 
lion objects. In our analysis, we include galaxies of the eight 
northern stripes 9-12, 34-37 and three southern ones 76, 82, 
86 - that is, all DRl stripes except the shortest ones, 42 
and 43. While we use some data outside the DRl, our area 
adds up to marginally smaller than the total area of DRl'^. 
The galaxies were split into four apparent magnitude bins 
that can be compared to previous results of other surveys as 
well as the early SDSS measurements by Szapudi et al 2002. 
The number of galaxies with dereddened ir model magni- 
tudes between 18-19, 19-20, 20-21 and 21-22 are 732,216, 
2,047,766, 5,455,559 and 10,890,300, respectively. That is al- 
together more than 19.1 million galaxies, which is an order 
of magnitude more than the largest higher order statistical 
study to date. 

The database also holds the relevant information about 
areas on the sky that are to be censored in any type of statis- 
tical studies of spatial distribution of galaxies. Bright stars, 
satelites, airplanes and bad seeing account for approximately 
12% loss in the area of DRl. These masked regions on the 
sky are extremely hard to deal with in CIC measurements, 
as explained below. 

Since we have measured CIC in 11 virtually indepen- 
dent SDSS stripes, we were able to estimate the variance 
in a fairly robust fashion, by taking the unbiased dispersion 
over the 11 stripes and dividing the corresponding error by 
a factor vTT (see e.g. Colombi, Szapudi & Szalay 1998). All 



Note that DR4 is now publicly available. We however still use 
DRl because it is largely sufficient for the level of detail we wish 
to reach, given the size of the simulation used. 
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errorbars have been determined this way. For the mock cata- 
logs, the errors are determined from the dispersion obtained 
from 20 random stripes without further renormalization: the 
error estimated this way assumes only one stripe. Indeed, the 
simulation used to generate the mock catalogs is too small 
to have fair estimate of the errors on a full 11 stripes catalog. 

We performed a series of measurements on the DRl 
data set, using the CIC method. We extracted Sa to 
from the CIC statistics. All the measurements were carried 
out with masks corresponding to seeing limit FW HM = 
1.7, 1.8, 1.9, 2.0 arcsec. 

We have found that seeing has only minor effects 
for the most part, therefore we present measurements for 
FW HM = 1.7 only (shaded areas in Fig. ill. 

4.3 Mocks vs. DRl 

In Fig. il diamonds (resp. stars, triangles) show the mean 
values of Ss (resp. Si, Ss), obtained from the 20 mock 
catalogues. The symbols connected with continuous (resp. 
dashed, dot-dashed) lines correspond to the PPI (resp. FOF, 
SUB) schemes for positioning galaxies within haloes. The 
dispersion of Sn estimates are shown with error bars for the 
SUB case only, for the sake of clarity. Measurements from the 
SDSS DRl are shown with the shaded and hatched areas. 
These were obtained with the 1.7 arcseconds seeing masks. 
Note that errors computed for the predictions are larger than 
for the data. As already mentionned above, the reason for 
that is that for the first case, the errors are obtained from 
the dispersion over the 20 simulated stripes, while the errors 
in the second case take into account the fact that there are 
11 stripes in the DRl survey, hence corresponding to errors 
~ VTO ~ 3 times smaller. Note that some data points are 
not shown from the model at large and small scales. Points 
were removed when the associated error bars became too 
large. 

As for the 2-point angular correlation, model results in 
the two faintest magnitude bins are strongly affected by in- 
completeness. This again leads to an over-estimate of the 
S„ coefficients, increasing with apparent magnitude. Simi- 
larly, large scales are affected by finite volume effects. The 
poor agreement between the models and the observations 
at large scales, even on the upper panels of Fig. |S|can thus 
certainly be blamed on finite volume/edge effects as men- 
tioned earlier. It remains quite acceptable given the error 
bars. The fact that cumulants from the three different posi- 
tioning schemes converge only at rather large scales - larger 
than for the 2-point correlation function - is just because 
these quantities are cell averages, i.e. integrated from to 
6. At brighter magnitudes (r < 20), the model's predictions 
are robust, as discussed for the ACF. Fig.|Slthen tells us the 
following. 

(i) The FOF and SUB schemes show a similar good agree- 
ment with observations and are almost indistinguishable 
from each other given the level of uncertainty on the mea- 
surements. In principle, one might expect significant differ- 
ences between FOF and SUB at small scales, as found for 
the two-point correlation function, but the effect seems be 
of the same order of magnitude on (5^)c and {S'^)^~^ and 
thus disappears in the normalisation. 

(ii) We recover in the two upper panels of Fig. |S| the fact 



found in the previous section, that the PPI scheme leads 
to an expected strong under-estimate of the observed Sn 
coefficients. The effect is the strongest at small scales, while 
PPI converges to FOF and SUB at large scales, when data 
points are available. 

(iii) Finally, the agreement of GALICS with the DRl esti- 
mates is quite a succes, provided that modelled galaxies are 
distributed as sub-structures or DM within haloes. 



5 CONCLUSIONS 

In this paper, we have used a novel technique for construct- 
ing mock SDSS-like observations fro m the predictions of 
a hybrid model o f galaxy formation jHatton et alJ l2003l : 
iBlaizot et al]l2005ft . Although mock observations have been 
mad e in the past f rom hybr id models of ga laxy formation 
(e.g. iDiaferio et al] [i999: M athis et al.ll200a) . we emphasize 
that our method is general and can readily be used to repro- 
duce any type of extra-galactic survey. We have used these 
mock observations to carry out a detailed comparison of the 
clustering properties of galaxies observed in the SDSS to 
those predicted by a state-of-the-art implementation of the 
hierarchical galaxy formation scenario. We have carefully 
investigated the limitations of our model, which are mostly 
due to mass resolution and finite volume of the DM simu- 
lation. Mass resolution directly translates into incomplete- 
ness at faint apparent magnitudes, such that the selected 
galaxies inhabit haloes biased towards high masses. This in 
turn leads to an increasing over-estimate of the clustering 
statistics faintwards. Our predictions are robust, though, at 
magnitudes brighter than r ~ 20. The finite volume of the 
simulation introduces a negative bias in clustering statis- 
tics at large scales, well k nown as the "integral constrain" 
problem. We have shown in lBlaizot et al . (2005) how this af- 
fects angular correlation function estimates from mock cata- 
logues, and can thus safely define a safe validity scale range 
for our predictions, which typically extends up to a tenth 
of the apparent size of the simulated volume at the median 
redshift of the selected sample. Within the rather restricted 
domain where the model predictions are valid, we find a good 
agreement with the observed angular two-point correlation 
function. 

At small scales - typically < l/i~^Mpc - our standard 
PPI positioning scheme is found to under-estimate the 2- 
point correlation function. This can be explained by the 
fact that this modelling of galaxy positions within haloes 
yields too diluted a distribution of galaxies within groups 
and clusters. We thus investigated the impact of changing 
the spatial distribution of galaxies within haloes on the ACF 
and found that observations can be explained if galeixies 
have a distribution somewhere between that of DM parti- 
cles and DM s ub-structures , as suggested by the early results 
of lKauffmann et al.l l|l999i). 

Moving to higher-order statistics, we can robustly rule 
out the PPI scheme. The uncertainty on the measurments 
do not allow to discriminate between the FOF and SUB 
schemes. This work shows that modelling the distribution 
of galaxies within massive haloes is a difficult task. In par- 
ticular, the instantaneous view of the DM distribution is 
not enough (yet) to populate haloes, because (i) the posi- 
tions of galaxies are the result of an evolutionary process 
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Figure 6. Third (resp. fourth, fifth) order cumulants measured in 20 mock catalogues are shown as diamonds (resp. stars, triangles). 
Solid (resp. dashed, dot-dashed) lines connect symbols corresponding to the PPI (resp. FOF, SUB) scheme. The error bars associated 
with symbols show the dispersion around the average estimate. The four panels correspond to different apparent magnitude selections, 
as in Fig. m The shaded regions show the locus of SDSS measurements. 



and (ii) the sub-structures that harbour them dissolve if 
prohibitively high resolution is not used. A straight-forward 
biasing scheme based either on DM particles or on DM sub- 
structures is thus, as already found in the literature, a poor 
proxy for positioning galaxies, and we show that it leads 
to an over-estimate (respectively an under-estimate) of the 
clustering signal at small separations. Existing attempts to 
follow the dynamics of galaxies within DM-only simulations 



still suffer from resolution effects JSprineel et alJl200j) . As 
a result, most galaxies in the core regions of massive haloes 
are attached to particles (once most-bound) rather than sub- 
structures. This makes it necessary for HOD models to in- 
corporate in some way the evolution of the sub-haloes, e.g. 
by keeping track of once most-bound particles, just like in 
SAMs. 

Most of the limitations of the present work are due to 
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the rather small range where our model's predictions are ro- 
bust, which is in turn mainly due to the properties of the 
DM simulation we use. Namely, mass resolution and finite 
volume effects do not allow us to make full use of the wealth 
of data obtained by the SDSS. One obvious way to improve 
the situation is to use bigger simulations. In this prospect, 
the so-called "millennium simulation" from the Virgo con- 
sortium'' will undoubtedly help us progress on the interpre- 
tation of the clustering properties of galaxies in the nearby 
Universe. The mass resolution of this simulation is about f 
times better than that of the simulation used in this work. 
This should allow us to make better use of the observations 
in the apparent-magnitude range 20 < r < 22. And the vol- 
ume of the millennium simulation is 125 times larger, which 
should allow (i) better estimates of cosmic variance, and 
(ii) robust characterisation of the large scale distribution of 
galaxies (aleviating finite volume effects). The sheer statis- 
tics from this simulation should also allow us to carry out a 
more subtle study of the dependence of galaxy clustering on 
various galaxy properties (e.g. luminosities, colors, age, mor- 
phological types, etc.), thereby allowing to set constraints on 
the baryonic physics of galaxy formation. 
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APPENDIX A: SUBSTRUCTURES 
Al Identification 

We identified sub -structures in our D M simulation using the 
AdapatHQP code iAubert et alJl2004 ap pendix B.), which 
is an extension of the HOR halo finder ( Eisc nstein fc Hul 
Il998t) . This algorithm exploits basic principles of the Morse 
theory to extract a tree of structures (the halos) and sub- 
structures from a distribution of DM particles. It proceeds 
in four steps, which are the following. 

(i) One needs to estimate the local density associated to 
each DM particle using smoothed-particle hydrodynamics 
(SRH) interpolation over A'^sph neighbours. Here, we take 
A'spH = 20 in order to match the FOF halo population, as 
explained below. During this process, one should store the 
A'hop nearest neighbours for la ter use. In this paper, w e 
take A'^HOP = 16 as advocated bv lEisenstein fc Hud il993) . 

(ii) One locates the "leaves" of the tree, i.e. the most ele- 
mentary substructures, by associating groups of particles to 
local SRH density maxima. This step is performed by a walk 
from particle to particle, the next particle being the one with 
the maximum SRH density among the particle itself and its 
A'hop neighbours. 

(iii) One can then establish the connectivity between these 
"peak-patches" by locating saddle points at the boundaries 
of the above regions. 

(iv) Finally, one builds the tree of structures and substruc- 
tures as a function of density threshold using the saddle 
points to determine if two sub-structures are connected or 
not. 

Note that we use a criterion relying on local Roisson 
noise in order to assess a sub-structure's statistical signifi- 
cance: basically, a structure of density p and with A'^ particles 
must be at a 4-cr level compared to local background, pb, to 
be significant: p > pb x (1 + 4/\/iV). 



most of its particles. This prescription has the advantage 
to avoid mis-identifications, especially in rich environments, 
nearby massive groups or clusters. 

Also, because of mass resolution, nothing guarantees a 
priori that we detect enough sub-structures to fit in all galax- 
ies detectable by the SDSS. We address this issue in Fig. 
I A II for galaxies with 18 < r < 19 (left-hand-side plot) and 
19 < r < 20 (right-hand-side panel). The solid lines show 
the fraction of haloes containing more detected galaxies than 
sub-structures in one of our mock catalogues. Only haloes 
actually containing at least one galaxy were considered for 
the normalisation of this fraction. The up-right hatched re- 
gion shows the red-shift distribution of haloes containing at 
least one detected galaxy. Considering this region and the 
solid curve, one sees that about 1% of the haloes do not con- 
tain enough sub-structures in both magnitude ranges. The 
dashed line shows the fraction of galaxies which are not as- 
sociated to sub-structures in the same mock catalogue, that 
is, the fraction of galaxies that we have to distribute on ran- 
dom halo particles. The red-shift distribution of galaxies in 
the same catalogue is shown with an arbitrary normalisation 
with the down-right hatches. Again, about 1% of detected 
galaxies only are not associated with sub-structures. These 
two plots show that the level of contamination of the SUB 
scheme by particles is at the ~1% level at most. In other 
words, the clustering signal obtained with the SUB scheme 
does indeed come from sub-structures. 



A2 Cross-matcli witli FOF groups 

Our FOF haloes being identified with a linking-length pa- 
rameter b — 0.2, we identify haloes with AdapatHQP as con- 
nected regions of SRH density larger than pth = 81 (e.g. 
lEisenstein fc Hn3ll998^ . An additional fine-tuning of the 
match between FOF and AdapatHQP halo populations re- 
quires using A^sPH ~ 20, in agreement with the minimum 
number of particles allowed in FOF haloes. Still, the haloes 
produced by AdapatHQP are unfortunately slightly differ- 
ent from the FOF ones. We thus associate AdapatHQP sub- 
structures to FOF haloes according to one simple rule : a 
sub-structure is associated to the FOF halo which contains 
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Figure Al. Dashed lines sliow the fraction of selected galaxies which are not associated with a sub-structure. Solid lines show the fraction 
of haloes containing more detected galaxies than sub-structures. The hatched area shows the arbitrarily scaled red-shift distribution of 
galaxies (down-right hatches) and haloes (up-right hatches) in the catalogue. Less than 1% of galaxies are not associated with sub- 
structures in our catalogues, at r < 20. NB : in the right-hand-side panel, the truncation of the red-shift distributions is only an artifact 
of the plotting routine. 



